# High-precision Reward Model
Skywork Reward Llama 3.1 8B V0.2
An advanced reward model built on the Llama-3.1-8B-Instruct architecture, trained with 80K high-quality preference pairs, excelling in handling preference issues in complex scenarios.
Large Language Model
Transformers

S
Skywork
25.99k
35
Ppo LunarLander V2
This is a reinforcement learning model based on the PPO algorithm, specifically designed to solve the landing task in the LunarLander-v2 environment.
Physics Model
P
araffin
65
18
Featured Recommended AI Models